Modified Forms of Hepatitis C NS3 Protease for Facilitating Inhibitor Screening 



and Structural Studies of Proteasetlnhibitor Complexes 



This application claims priority from provisional U.S. Application Serial No. 
5 60/1 15,27 1, filed January 8, 1999, which is incorporated herein by reference in its 
entirety. 

Technical Field of the Invention 

The present invention relates to modified forms of the Hepatitis C NS3 
protease. The wild type protease is essential in vivo for viral replication of Hepatitis 
10 C. The novel proteins of this invention are useful for screening for inhibitors of the 
protease and for structural studies of the protease and protease:inhibitor complexes. 

Background of the Invention 

Hepatitis C virus (HCV) infection is the suspected cause of 90% of all cases of 

15 non-A, non-B hepatitis (Choo et al., 1989, Kuo et al., 1989). HCV infection is more 
common than HIV infection with an incidence rate of 2-15% worldwide. Over 4 
million people are infected with HCV in the United States alone. While primary 
infection with HCV is often asymptomatic, almost all HCV infections progress to a 
chronic state that persists for decades. A staggering 20-50% are thought to eventually 

20 develop chronic liver disease (e.g. cirrhosis) and 20-30% of these cases will lead to 
liver failure or liver cancer. Up to 12,000 people in the U.S. will die this year from 
sequelae associated with HCV infection. As the current population ages over the next 
two decades, the morbidity and mortality associated with HCV are expected to triple. 
The development of safe and effective treatment(s) for HCV infection is a major 

25 unmet medical need. 

The established principle for antiviral intervention is the direct inhibition of 
essential, virally encoded enzymes. The only approved treatment for HCV infection 
is interferon, however, which indirectly effects HCV infection by altering the host 
immune response. Interferon treatment is largely ineffective, as a sustained antiviral 

30 response is produced in less than 30% of treated patients. A safe and effective 
antiviral treatment that blocks viral replication directly would likely have a much 
more beneficial impact on the public health for HCV infection than does interferon 
treatment. There have been no such inhibitors of HCV replication disclosed, to date. 
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Vaccination to prevent HCV disease has not shown promise due to the lack of 
efficacy of vaccine candidates for HCV. 

Hepatitis C virus is a positive-strand RNA virus of the family Flaviviridae. 
The HCV genome encodes a single polyprotein of 3033 amino acids, of which 
5 residues 1026 to 1657 (631 amino acids) represent the NS3 protein (Choo et al, 
1991). The HCV NS3 protein is a site-specific protease that cleaves the HCV 
polyprotein selectively at four sites related by their primary amino acid sequences 
(Grakoui et al., 1993a). These cleavages give rise to the mature non-structural 
(replicative) proteins of HCV, including NS3, NS4A, NS4B, NS5A, and NS5B 

10 (Bartenschlager et al., 1993; Grakoui et al., 1993b; Hijikata, et al., 1993a,b; Tomei et 
al., 1993; Bartenschlager et al., 1994; Eckart et al., 1994; Lin et al., 1994; Manabe, et 
al 1994). Genetic studies have demonstrated that the homologous NS3 proteases of 
related viruses (e.g. Yellow Fever Virus and Bovine viral diarrhea virus) are 
absolutely essential for viral replication (Chambers et al., 1990; Xu et al., 1997). 

15 Thus, inhibitors of NS3 protease should inhibit HCV replication and would be useful 
for the discovery and development of effective antiviral treatments for HCV infection. 

Efficient processing of the HCV polyprotein by NS3 also requires the NS4A 
protein, amino acids 1658-1712 (58 amino acids) of the HCV polyprotein 
(Bartenschlager et al., 1994; Overton et al., 1994; Bartenschlager et al., 1995; 

20 Bouffard et al, 1995; Tanji et al., 1995). NS4A stimulates protease activity through 
the formation of a heteromeric complex with NS3 (Bartenschlager et al, 1995; Lin et 
al 1995; Satoh et al., 1995). NS4A is also thought to target the localization of the 
NS3 protease to the ER membrane, the likely site of viral replication (Hijikata et al., 
1993b; Lin and Rice, 1995; Tanji et al., 1995). Studies to map the functional domains 

25 of NS3 and NS4A have demonstrated that the protease catalytic domain of NS3 
resides within amino acids 1-181 (Bartenschlager et al., 1994; Tanji et al., 1994; 
Failla et al., 1995; Shoji et al., 1995) and that the catalytic domain interacts with, and 
is stimulated by, NS4A (Hijikata et al., 1993a; Lin et al., 1994; Bartenschlager et al., 
1995; Failla et al., 1995; Satoh et al., 1995; Tanji et al., 1995). The remaining 450 

30 amino acids of NS3 comprise a functional domain with helicase and ATPase activities 
which are thought to be involved in viral genome replication (Jin and Peterson, 1995). 
Functional studies of NS4A in vitro demonstrated that the protease stimulatory 
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activity mapped to amino acids 21-34 of NS4A (Lin et al, 1995; Tomei et al., 1995; 
Shimizu et al, 1996). The N-terminal 20 amino acids of NS4A, on the other hand, 
are largely hydrophobic in nature and might serve as a transmembrane anchor domain 
(Lin and Rice, 1995). 

5 The three-dimensional structure of the protease catalytic domain of NS3 has 

been determined by X-ray crystallography, with and without a cofactor peptide from 
NS4A (Kim et al., 1996; Love et al., 1996; Yan et al., 1998). These structures 
revealed very strong structural homology to chymotrypsin-like serine protease 
domains with the canonical catalytic triad comprising Ser-139, His-57, and Asp-81. 

10 The N-terminal 28 amino acids of NS3 were unique, however, as they were 

unstructured in the absence of NS4A, while in the presence of NS4A peptide this 
region adopts fi-strand and oc-helix secondary structures. The co-crystal structure 
revealed that the NS4A peptide is inserted into, and partially buried by, adjacent S- 
strands of NS3. Local rearrangements near the protease active site also occur as a 

15 result of NS4A binding, and these are thought to render the protease more 
catalytically active. Thus, NS4A would be expected to stabilize the active 
conformation of the HCV protease. 

Near the N-terminus of NS3 is an oc-helix spanning residues 13-21 (oc-helix 0 ) 
that appears to be stabilized by the NS4A peptide. The external face of this helix is 

20 very hydrophobic and consists entirely of branched aliphatic residues. Due to its 
hydrophobic nature, it has been speculated that this surface might be involved in 
additional membrane interactions for anchoring the NS3:NS4A complex to 
cytoplasmic membranes (Yan et al., 1998). 

Routine methods for the expression of recombinant NS3 protease (e.g E. coli, 

25 baculovirus) have been employed widely. A common problem encountered when 
expressing wild-type NS3 protease (either full-length or truncated catalytic domain) 
has been the production of either insoluble or poorly soluble protein, especially when 
using E. coli vector systems. The best systems described to date have produced low 
levels of recombinant wild-type protease and the protease tends to be poorly soluble 

30 (Shoji et al., 1995; Suzuki et al., 1995; Hong et al., 1996; Steinkuhler et al., 1996). As 
many of these preparations are enzymatically active, this approach has sufficed to 
generate active enzyme for activity analysis and inhibitor screening. However, to 
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carry out structural studies, highly expressed enzymes characterized by high solubility 
and low aggregation, in addition to enzymatic activity, are required. 

Efforts have been made to overcome problems associated with low expression 
and/or poor solubility of the HCV protease, by constructing genetically engineered 
5 fusion derivatives of the native NS3 protease domain. Most notable are the 

generation of NS3 protease catalytic domains that form slowly-growing crystals 
suitable for structure determination by X-ray crystallography (Love et al., 1996; Kim 
et al., 1996; Yan et al., 1998). These have involved the construction of genetically 
engineered derivatives of NS3 by fusing polypeptide tags to the N-terminus and/or C- 

10 terminus that enhance the stable expression and/or solubility of the expressed protein 
(e.g. basic amino acids, poly-histidine). Other types of protease fusions (e.g. with 
ubiquitin, glutathione-S-transferase, maltose binding protein), including fusion of the 
NS4A protein to the C-terminus of the protease catalytic domain (Inoue etal., 1998), 
have been described that are partly soluble when expressed in E. coli, but few if any 

15 of these have overcome the critical limitation of low overall solubility. Very recently, 
bacterial expression of constructs in which the NS4a segment is fused to the N- 
terminus of the NS3 protease have been reported (Taremi et al., 1998; Pasquo et al., 
1998); however, overall solubility of the final preparations were not reported. 

There has been no published report of a NS3 preparation that is suitable for 

20 protein NMR work, as NMR studies typically require protein preparations that are 
expressed at high levels, are very highly soluble (>1 mM), and do not form soluble 
aggregates when purified. In addition, no X-ray structures of HCV protease 
complexed with enzyme inhibitors have been reported to date. 

25 Summary of the Invention 

At this time, no known pharmaceutical agent is available to prevent or cure 
HCV infection. HCV replication is dependent upon the activity of the virally encoded 
NS3 protease. Thus, elucidation of a specific inhibitor of this protease activity would 
be useful for the discovery of drugs to block HCV replication. This can be achieved 
30 using one or a combination of methods including, but not limited to: screening for 
small molecule inhibitors to serve as leads for medicinal chemistry; and the analysis 
of the three-dimensional structures (by X-ray crystallography or NMR) of complexes 
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between the HCV NS3 protease and compounds that bind to it in efforts to discover 
insights as to how the compounds might be chemically modified to produce potent 
inhibitors of this viral protease. 

This invention enables the discovery of drugs that prevent or cure HCV 
5 infection. This invention encompasses novel, highly soluble, modified forms of HCV 
NS3 protease. More specifically, this invention includes novel, highly soluble, 
modified HCV NS3 proteases and novel, highly soluble, modified HCV NS3-NS4a 
fusion proteases. These novel proteins greatly facilitate screening for small molecule 
inhibitors and analysis of the three-dimensional structures, through X-ray 
10 crystallography and NMR spectroscopy, of complexes between the HCV NS3 
protease and compounds that bind to it. 

The present invention results from a number of significant modifications made 
to the wild-type HCV protease sequence. 

One aspect of the invention is a modified HCV NS3 protease comprising an 
15 HCV NS3 protease comprising at least one substitution in the HCV NS3 protease of 
a hydrophobic a-helix 0 amino acid residue to a hydrophilic amino acid residue. 

Another aspect of the invention is a modified HCV NS4a-NS3 fusion protease 
comprising a modified HCV NS3 protease fused to a HCV NS4a or modified HCV 
NS4a. 

20 Further aspects of the invention are nucleic acid molecules encoding the 

proteins of the present invention, vectors and host cells. 

A further aspect of the invention is methods of making proteins of the present 
invention. 

25 Brief Description of the Figures 

• Figure 1. Helical wheel representations of a-helix 0 (residues 13-21) of HCV 
protease. Residues in bold font are solvent exposed. (Top) Amino-acid sequence 
of wildtype a-helix 0; (Left) wild-type a-helix 0; (Right) Amino-acid 
substitutions in the a-helix 0 variants (X = His, Lys, Glu, Gin, Asp, or Asn) (see 

30 Experiments 3 and 4). 

• Figure 2. Diagram of construction of modified HCV NS4a-NS3 fusion proteases. 
NS4a (residues 21-31) are fused to the N-terminus of NS3 in these diagrams by 
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way of a linker. "X" denotes a sequence change relative to SEQ ID NO:3. "N" 
and "C" denote the N- and C-termini of the constructs, respectively. 

• Figure 3. Diagram of bacterial selection scheme for obtaining soluble HCV 
protease mutants. See Example 4 for a detailed description of the system. 

5 (Center) expression plasmid (expressing HCV NS4a-NS3 fusion protease and 

modified Tet repressor) and chromosomally encoded Tet promoter-CAT 
(chloramphenicol acetyl transferase) gene fusion; (Right) case if NS3 protease is 
insoluble (activity masked by insolubility of the protease resulting in 
chloramphenicol-sensitive bacteria); (Left) case if NS3 protease is soluble 

10 (protease is active resulting in chloramphenicol-resistant bacteria). 

• Figure 4. SDS-PAGE analysis of expression of various HCV NS4a-NS3 fusion 
protein constructs. Plasmid containing cells were grown to OD 60 o-0.7 and 10 ml 
cultures were induced with 0.25 mM IPTG for 20 hours at 20 degrees C. Cells 
were harvested by centrifugation (1500 rfc) in a tabletop microfuge and cell 

15 pellets were resuspended in 1ml of 25mM Na-phosphate buffer,pH 7.5; 0.5M 

NaCl, 2mM DTT, 10:M ZnCl, lOmM MgCl,10:g/ml DNAse and sonicated twice 
for 1 min at power 5 in pulse mode. The homogenates were spun down in tabletop 
microfuge at max speed (20800 rfc) for 20min. Homogenates and supernatants 
were analyzed on 10-20% SDS-PAGE pre-cast gels (Bio-Rad). Lane 1, molecular 

20 weight standards. The following samples are in pairs of homogenate and 

supernatant, respectively: Lanes 2 & 3, parental fusion (SEQ ID NO:3); Lanes 4 
& 5, helix0-l mutations only (SEQ ID NO: 12); Lanes 6 & 7, optimized linker 
only (SEQ ID NO:24); Lanes 8 & 9, helix0-l mutations with optimized linker 
(SEQ ID NO: 14). 

25 • Figure 5. NMR analysis of modified HCV NS4a-NS3 fusion proteases +/- 

optimized linker. 2D 1 U- 1S N HSQC spectra were obtained for 15 N-labeled mutant 
HCV NS4a-NS3 fusion proteases (all having the Helix0-1 sequence [see Figure 
1 1 - SEQ ID NO:6] and purified as outlined in Example 7). Panel A - with non- 
optimized linker (see Example 4, SEQ ID NO: 12); Panel B - with optimized 

30 linker (see Example 5, SEQ ID NO: 14); Panel C -with optimized linker and 
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A40T, I72T, P86Q, C47S, C52L, C159S mutations (see Example 6, SEQ ID 
NO: 18). 

• Figure 6 shows an alignment of the amino acid sequences of SEQ ID Nos: 1,3, 
12, 14, 16, 18, 20, 22 and 24. Bolded letters with stippling indicates residue 

5 positions that are mutated relative to SEQ ID NO: 1 . 

• Figure 7. Overlayed NMR *H- 15 N NHSQC spectra of a modified HCV NS4a- 
NS3 fusion protease with optimized linker (SEQ ID NO: 18) in apo-form and 
complexed with a peptide inhibitor (see Example 9). Apo-protease (thin grey 
line), peptide-complexed protease (thick black line). 

10 • Figure 8. Portion of the electron density map of a modified HCV NS4a-NS3 

fusion protease (SEQ ID NO: 18) complexed with a peptide inhibitor (see Example 
10). Two residues of the peptide inhibitor are shown: Cysi and Cha 2 . 

• Figure 9. Amino acid sequence of (SEQ ID NO: 1) and nucleic acid sequence 
encoding (SEQ ID NO: 2) the parental non-fusion wild type HCV NS3 protease 

15 sequence. 

• Figure 10. Amino acid sequence of (SEQ ID NO: 3) and nucleic acid sequence 
encoding (SEQ ID NO: 4) the initial HCV NS4a-NS3 fusion protease. 

• Figure 1 1 . Amino acid sequence (SEQ ID NO: 5) of the oc-helix 0 region of wild 
type HCV NS3 protease, and amino acid sequences of (SEQ ID NOS: 6-11) a- 

20 helix 0 regions (helix0-l, helix0-3, helixO-4, helixO-7, helixO-8, and helix0-10 

respectively) of various soluble modified HCV NS4a-NS3 fusion proteases that 
are resistant to high levels of chloramphenicol in the bacterial selection scheme 
(see Example 4). 

• Figure 12. Amino acid sequence of (SEQ ID NO: 12) and nucleic acid sequence 
25 encoding (SEQ ID NO: 13) a modified HCV NS4a-NS3 fusion protease with the 

ot-helix 0 variant sequence helix0-l 
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• Figure 13. Amino acid sequence of (SEQ ID NO: 14) and nucleic acid sequence 
encoding (SEQ ID NO: 15) a modified HCV NS4a-NS3 fusion protease with the 
a-helix 0 variant sequence helixO- 1 and an optimized linker sequence 

• Figure 14. Amino acid sequence of (SEQ ID NO: 16) and nucleic acid sequence 
5 encoding (SEQ ID NO: 17) a modified HCV NS4a-NS3 fusion protease with the 

a-helix 0 variant sequence helixO-1, an optimized linker sequence, and surface 
mutations 

• Figure 15. Amino acid sequence of (SEQ ID NO: 18) and nucleic acid sequence 
encoding (SEQ ID NO: 19) a modified HCV NS4a-NS3 fusion protease with the 

10 a-helix 0 variant sequence helixO-1, an optimized linker sequence, surface 

mutations, and cysteine mutations 

• Figure 16. Amino acid sequence of (SEQ ID NO: 20) and nucleic acid sequence 
encoding (SEQ ID NO: 21) a modified HCV NS4a-NS3 fusion protease with the 
a-helix 0 variant sequence helixO-7, an optimized linker sequence, surface 

15 mutations, and cysteine mutations 

• Figure 17. Amino acid sequence of (SEQ ID NO: 22) and nucleic acid sequence 
encoding (SEQ ID NO: 23) a modified HCV NS4a-NS3 fusion protease with the 
a-helix 0 variant sequence helixO-7, optimized linker sequence, surface mutations, 
cysteine mutations and C16T mutation 

20 • Figure 18. Amino acid sequence of (SEQ ID NO: 24) and nucleic acid sequence 
encoding (SEQ ID NO: 25) a NS4a-NS3 fusion protein with wild-type a-helix 0 
sequence and optimized linker sequence 

Definitions 

25 The following definitions are provided to more clearly delineate what is 

contemplated in this invention. 

• "HCV" refers to the hepatitis C virus. 
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• "HCV NS3" refers to the protein fragment of the HCV polyprotein from any wild 
type strain of HCV that corresponds to residues 1026 - 1657 of the HCV 
polyprotein (as defined in Choo et al Proceedings of the National Academy of 
Sciences USA 88, 2451-2455 [1991]). The numbering convention for HCV N53 

5 throughout this application starts with residue 1 corresponding to residue 1026 of 

the HCV polyprotein, which is the first amino-acid residue of the mature 
processed NS3 protein fragment. HCV NS3 has portions which confer protease 
activity, helicase activity, and ATPase activity. 

• "HCV NS3 protease" refers to any portion of the wild type HCV NS3 that has 
10 protease activity, not restricted to, but commonly associated with, HCV NS3 

protease domain; or any wild type peptide that exhibits the protease activity 
associated with HCV NS3. 

• "HCV NS3 protease domain" refers to the portion of wild type HCV NS3 that 
confers protease activity, usually encompassing HCV NS3 residues 1-181, but 

15 sometimes differing by the inclusion or deletion of residues at either the N- or C- 

terminus. 

• "Modified HCV NS3 protease" refers to a peptide or protein whose sequence is an 
alteration from a wild-type HCV NS3 protease sequence and that exhibits the 
protease activity of HCV NS3 protease. Such modifications include, but are not 

20 limited to, naturally-occurring amino acid substitutions, non-naturally-occurring 

amino acid substitutions, conservative amino acid substitutions, amino acid 
insertions, amino acid deletions, and amino acid additions. Non-sequence 
modifications, including changes in acetylation, methylation, phosphorylation, 
carboxylation, or glycosylation, are also included in the definition of modified. 

25 • "HCV NS4a" refers to the protease-stimulating protein fragment of the HCV 

polyprotein from any wild type strain of HCV that corresponds to residues 1658- 
1712 of the HCV polyprotein (as defined in Choo et al. Procedings of the 
National Academy of Sciences USA 88, 2451-2455 [1991]), any fragment thereof 
that exhibits protease-stimulating activity, or any wild type peptide that exhibits 

30 the protease-stimulating activity associated with residues 1658-1712 of the HCV 
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polyprotein. Full-length HCV NS4a particularly refers to residues 1-58, which 
correspond to residues 1658-1712 of the polyprotein. The numbering convention 
throughout this invention for HCV NS4a starts with residue 1 corresponding to 
residue 1658 of the HCV polyprotein (same as the first residue of the mature 
5 processed HCV NS4a fragment). 

• "Modified HCV NS4a" refers to a peptide or protein whose sequence is an 
alteration from a wild-type HCV NS4a sequence and that exhibits the protease- 
stimulating activity of HCV NS4a. Such modifications include, but are not 
limited to, naturally-occurring amino acid substitutions, non-naturally-occurring 

10 amino acid substitutions, conservative amino acid substitutions, amino acid 

insertions, amino acid deletions, and amino acid additions. Non-sequence 
modifications, including changes in acetylation, methylation, phosphorylation, 
carboxylation, or glycosylation, are also included in the definition of modified. 

• "Modified HCV NS4a-NS3 fusion protease" refers to a modified HCV NS3 

15 protease fused to a HCV NS4a or modified HCV NS4a. A modified HCV NS4a- 

NS3 fusion protease may include an optimized linker sequence. 

• "Modified forms of HCV NS3 protease" refers to the totality of the invention 
described herein and encompasses modified HCV NS3 proteases, modified HCV 
NS4a-NS3 fusion proteases, or both. 

20 • "Naturally-occurring amino acid" refers to any of the 20 standard L-acids that 
occur in a referred-to position in any wild type HCV NS3 protease or wild type 
HCV NS4a. 

• "Non-naturally-occurring amino acid" refers to any of the 20 standard L-amino 
acids that do not occur in a referred-to position in any wild type HCV NS3 

25 protease or wild type HCV NS4a, D-amino acids, and synthetic amino acids such 

as 6 or y amino acids. 

• "Conservative amino acid substitution" refers to the substitution of one amino 
acid for another with similar characteristics, e.g., substitutions within the 
following groups: valine and glycine; glycine and alanine; valine and isoleucine 
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and leucine; aspartic acid and glutamic acid; asparagine and glutamine; serine and 
threonine; lysine and arginine; and phenylalanine and tyrosine. Other 
conservative amino acid substitutions can be taken from the table below. 

Table 1 



5 Conservative amino acid replacements 



TTtfYp Amino Af*IH 


Code 


Kpnl^fo with qhv nf* 


Alanine 


A 


D-Ala, Gly, beta-Ala, L-Cys, D-Cys 


Ar^inmc 


R 


D-Arg, Lys, D-Lys, homo-Arg, D-homo-Arg, IVIet, 
He, D-Met, D-Ile, Orn, D-Orn 


Asparagine 


N 


D-Asn, Asp, D-Asp, Glu, D-Glu, Gin, D-Gln 


Aspartic Acid 




D Acr* F) A cn Acn rili i F> dhi Ol n D Clin 
1 _v - / \ S [ ; . I_y . VSI 1 , /-Vail, ^ 1 1 U . 17 v II LI. v_J 111. 1^* v J 1 1 1 


Cysteine 




D-K^ys, o-ivie-i^ys, iviei, u-ivrei, inr, u-inr 


Glutamine 


Q 


D-Gln, Asn, D-Asn, Glu, D-Glu, Asp, D-Asp 


Glutamic Acid 


E 


D-Glu, D-Asp, Asp, Asn, D-Asn, Gin, D-Gln 


Glycine 


G 


Ala, D-Ala, Pro, D-Pro, 6- Ala, Acp 


Isoleucine 




D-Ile, Val, D-Val, Leu, D-Leu, Met, D-Met 


Leucine 


L 


D-Leu, Val, D-Val, Met, D-Met, He, D-Ile, 


Lysine 


K 


D-Lys, Arg, D-Arg, homo-Arg, D-homo-Arg, Met, 
D-Met, De, D-Ile, Orn, D-Orn 


Methionine 


M 


D-Met, S-Me-Cys, He, D-Ile, Leu, D-Leu, Val, D-Val 


Phenylalanine 


F 


D-Phe, Tyr, D-Thr, L-Dopa, His, D-His, Trp, D-Trp, 
Trans-3,4, or 5-phenylproline, cis-3,4, or 5- 
phenyl proline 


Proline 


P 


D-Pro, L-l-thioazolidine-4-carboxylic acid, D- orL- 
l-oxazolidine-4-carboxylic acid 


Serine 


S 


D-Ser, Thr, D-Thr, allo-Thr, Met, D-Met, Met(O), D- 
Met(O), L-Cys, D-Cys 


Threonine 


T 


D-Thr, Ser, D-Ser, allo-Thr, Met, D-Met, Met(O), D- 
Met(O), Val, D-Val 


Tyrosine 


Y 


D-Tyr, Phe, D-Phe, L-Dopa, His, D-His 


Valine 


V 


D-Val, Leu, D-Leu, He, D-Ile, Met, D-Met 
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• "Hydrophobic amino acid" refers to amino acid residues whose sidechains are 
relatively non-polar, including, but not limited to, alanine, phenylalanine, 
isoleucine, leucine, methionine, proline, valine, and tryptophan. 

5 • "Hydrophilic amino acid" refers to amino acid residues whose sidechains are 
relatively polar, including, but not limited to, aspartate, glutamate, lysine, 
asparagine. glutamine, arginine, serine, threonine, histidine and tyrosine. 

• "oc-helix 0" refers to the sequence consisting of HCV NS3 residues Leu i3 through 
Leu 2 i that takes on an alpha-helical structure when HCV NS3 protease is 

10 complexed with a HCV NS4a segment (as in Kim et al, Cell 87, 343-355 [1996] 

or Yan et al., Protein Science 7, 837-847 [1998]). 

• "Linker" refers to, in a modified NS4a-NS3 or NS3-NS4a fusion protease, a 
polypeptide sequence that joins the HCV NS4a sequence with the HCV NS3 
sequence. 

15 • "Optimized linker" refers to, in a modified NS4a-NS3 fusion protease, a linker 
sequence that joins the NS4a and NS3 sequences such that the resulting fusion 
protein has enhanced stability and solubility characteristics relative to a non- 
optimized linker. 

• "Zinc -binding cysteine residues" refer to the naturally-occurring cysteine residues 
20 Cys 97 , Cys 9 9, and Cysi 4 5 in HCV NS3. 

• "Non-zinc-binding cysteine residues" refer to the naturally-occurring cysteine 
residues Cys 4 7, Cys 52 , and Cysi 5 9 in HCV NS3. 

Detailed Description of the Invention 

25 The benefit of this invention is that the modified forms of HCV NS3 protease 

retain full activity, yet are highly amenable to biochemical experimentation because 
of their highly soluble (> 30 mg/ml) and non-aggregating nature under detergent-free 
conditions. In contrast, wild-type forms of the HCV NS3 protease (domain) require 
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detergents for solubilization. Because the modified forms of HCV NS3 protease of 
the invention exhibit very high degrees of solubility in the absence of detergents, they 
are well suited for NMR and X-ray crystallographic structure determination of HCV 
NS3 protease complexed with inhibitors, facilitating iterative structure-based drug 
5 design efforts with this pharmacologically important enzyme. Their solubility 
without the use of detergents also makes them very useful in screening assays for 
inhibitors. 

As previously noted, one aspect of the invention is a modified HCV NS3 
protease comprising at least one substitution in HCV NS3 protease of a hydrophobic 
10 a-helix 0 amino acid residue to a hydrophilic amino acid residue. 

Another aspect of the invention is a modified HCV NS4a-NS3 fusion protease 
comprising a modified HCV NS3 protease fused to a HCV NS4a or modified HCV 
NS4a. 

Various combinations in which hydrophobic amino acids are substituted to 
15 hydrophilic amino acids in the a-helix 0 are described elsewhere in the specification 
and in the claims. In a preferred embodiment, the hydrophobic a-helix 0 amino acid 
residues are selected from the group consisting of Leui 3 , Leuu, Hen, Heis, and Leu 2 i. 
In a more preferred embodiment, Leu^ is substituted to glutamic acid , Leu^ is 
substituted to glutamic acid, Ilei 7 is substituted to glutamine, Ilei 8 is substituted to 
20 glutamic acid, and Leu 2 i is substituted to glutamine. (This is helix 0-1 in Figure 1 1, 
SEQ ID NO: 6.) In another more preferred embodiment, Leu 13 is substituted to 
glutamic acid , Leu i4 is substituted to glutamine, Hen is substituted to glutamine, Ilei 8 
is substituted to lysine, and Leu2i is substituted to histidine. (This is helix 0-7 in 
Figure 11, SEQ ID NO: 9.) 

25 In another preferred embodiment of the invention, the modified HCV NS3 

protease further comprises at least one substitution of a hydrophobic amino acid 
residue not in the a-helix 0 to a hydrophilic amino acid residue. 

In an additional preferred embodiment, the modified HCV NS3 protease 
further comprises at least one substitution of a non-zinc -binding cysteine residue to a 
30 non-cysteine amino acid residue. 

In a preferred embodiment, the HCV NS3 protease that is altered comprises 
approximately residues 1-181 of HCV NS3. 
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In a preferred embodiment of an aspect of the invention that is a modified 
HCV NS4a-NS3 fusion protease, the HCV NS4a that is altered or unaltered comprises 
approximately residues 21-31 of full-length HCV NS4a. 

In an aspect of the invention that is a modified HCV NS4a-NS3 fusion 
5 protease, a preferred embodiment further comprises a linker comprising an optimized 
linker sequence. In a more preferred embodiment, the NS4a is linked to the amino 
terminus of the NS3. In a most preferred embodiment, the optimized turn sequence is 
Ser-Gly-Asp-Thr where Ser corresponds to NS4a residue Ser32 and Thr corresponds 
to NS3 residue Thr 4 . 

10 In a preferred embodiment of an aspect of the invention that is a modified 

HCV NS4a-NS3 fusion protease, the modified HCV NS4a further comprises at least 
one substitution of a hydrophobic amino acid residue to a hydrophilic amino acid 
residue. In a most preferred embodiment, residue 30 is substituted to asparagine. 

The invention also includes isolated nucleic acid molecules encoding the 

15 proteins of the present invention, vectors comprising said nucleic acid molecule, and 
host cells comprising said vectors. E. coli cells harboringing plasmids containing 
certain nucleic acid molecules of the present invention were deposited with the 
American Type Culture Collection (ATCC), 10801 University Blvd., Manassas, VA, 
2001 10 USA, and have ATCC accession numbers 204070 and 204071. 

20 The invention also includes methods of making proteins of the present 

invention using host cells of the present invention. 

The present invention exhibits distinct differences from and improvements 
over the prior art. In contrast to the published works documenting attempts to 
solubilize the HCV NS3 protease by relying solely on the fusion of solubilizing tags 

25 or protein fusion partners to the protease (i.e Kim et al. 1996; Yan et al., 1998; Taremi 
et al., 1998), the present invention changes amino-acid residues within the HCV NS3 
protease coding region itself, resulting in what is referred to herein as modified forms 
of the HCV NS3 protease. These modified forms retain full enzymatic activity. 
The present invention provides solubility of greater than 30 mg/ml — the 

30 highest reported level of detergent-free solubility for an HCV NS3 protease of any 
kind (wild type or engineered in any way). 
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Applicants have used the present invention to collect high quality NMR 
spectra. Presented herein are high quality NMR spectra of a modified form of HCV 
NS3 protease and of a modified form of HCV NS3 protease: inhibitor complex. These 
are the first reported instances of high quality NMR spectra of an HCV protease (wild 
type or engineered in any way) alone or in complex with an inhibitor. 

Applicants herein present a modified form of HCV NS3 protease:inhibitor 
complex determined by X-ray crystallography and demonstrate that the proteins of the 
present invention can rapidly produce high quality co-crystals with protease 
inhibitors. This is the first reported instance of X-ray crystallography showing an 
HCV protease (wild type or engineered in any way) complexed with an inhibitor. The 
proteins of this invention are especially useful because of their ability to produce high 
quality co-crystals with protease inhibitors. Structure-based drug design with a 
protein is often limited by it's ability to form diffraction-grade co-crystals with 
inhibitors in a timely manner. A protein, such as that of the present invention, that 
can be co-crystallized quickly facilitates the iterative process of structure-based 
design work. 

The modified forms of HCV NS3 protease of the present invention are also 
useful for screening for small molecules inhibitors of HCV NS3. Proteins of the 
present invention can be prepared in the absence of detergents, allowing for the 
identification of compound inhibitors that would otherwise be undetectable if 
screened in the presence of detergents. The modified HCV NS3 protease in the non- 
fusion form can also be used to study whether a compound interferes with the binding 
ofNS4a to NS3. 

The general strategy used to obtain these soluble modified HCV NS3 protease 
variants was to sequentially target key regions of the protein that might be important 
for protein solubility, mutagenize these targeted regions in a semi-random manner, 
and either select or screen for bacterial clones expressing protein variants that 
exhibited higher degrees of solubility. The steps used to make some of the preferred 
embodiments of the invention are outlined in the Examples. Over the course of the 
Examples, a progressively higher degree of amino-acid residue substitution (relative 
to the starting wild-type sequence) was generated until the production of modified 
forms of HCV NS3 protease that exhibited high levels of solubility with low levels of 
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aggregation was achieved. Many different hydrophobic-to-hydrophilic HCV NS3 
protease surface residue substitutions (naturally-occurring and non-naturally- 
occurring) were combined with different NS4a-NS3 fusion linker sequences, and 
these protease mutants were either selected or screened to find modified proteins that 
5 exhibited the desired solubility characteristics. Some completely conserved 

hydrophobic residue positions in the a-helix 0 were targeted for mutagenesis because 
they made up a particularly extensive hydrophobic patch on the surface of HCV NS3 
protease. Other HCV NS3 protease hydrophobic surface residues were also 
mutagenized; particularly good candidates were residues whose position was variable 

10 among the HCV NS3 sequences from other wild type HCV isolates. These and other 
substitutions are documented and included in the claims. 

The following paragraphs describe in greater detail the invention claimed and 
ways of making it. While the order of the paragraphs follows the experimental 
process used, one of skill in the art can figure out ways to make the invention claimed 

15 in which the steps are done in different orders and/or steps are omitted. 

Any HCV NS3 protease and HCV NS4a sequences can be used as a starting 
point for the modifications. Here, a cloned HCV isolate sequence was used as a 
starting point for the mutagenesis experiments (Example 1, Figure 9, SEQ ID NO: 2]. 
The expression system used features a synthetic gene in which all codons have been 

20 optimized for high-level expression in E. coli. While this may explain the high levels 
of expression observed for the resulting constructs, it is not essential to the invention 
and any nucleotide sequence encoding an HCV NS3 protease (using the standard 
genetic code) could be used to express these proteins. HCV NS3 protease includes 
any fragment of wild type HCV NS3 that exhibits protease activity or any wild type 

25 peptide that exhibits the protease activity associated with wild type HCV NS3 (as 
defined in the Definitions section). It is also not essential that the modified forms of 
HCV NS3 protease described here be expressed in E.coli. Any in vivo expression 
host (bacterial, insect, plant, mammalian, other) could be used to express these 
modified forms of HCV NS3 protease. Also, in-vitro production of these variants is 

30 possible. The present invention includes modified forms of HCV NS3 protease 
produced by any means. 
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The invention includes the use of HCV NS4a. See the Definitions section for 
the definition of HCV NS4a. In Example 2, the NS4a sequence comprising residues 
21-31 (G 2 i S 22 V 23 V 24 I25 V 26 G 27 R 28 I29 V 30 L31) of full-length NS4a (SEQ ID NO: 
26) was fused to the N-terminus of the HCV NS3 protease. The linker sequence in 
5 this experiment was the simple dipeptide sequence asparagine-glycine. A variety of 
other linkers could be used, and one of ordinary skill in the art would be able to 
choose other appropriate linkers. The NS4a could also be fused to the C-terminus of 
the NS3. As has been demonstrated (Lin et al., 1995; Tomei et al., 1995; Shimizu et 
al., 1996), a NS4a peptide including residues 21-31 increases the activity of the HCV 

10 NS3 protease in vitro. The linker described in Example 2 was initially used. 

However, better results (in terms of expression and solubility) were obtained from the 
optimized linker constructs resulting from the experiments described in Example 5. 
While others have very recently published other NS4a-NS3 linkers (Taremi et al., 
1998; Pasquo et al., 1998), the optimized linkers documented in this invention (in 

15 combination with the other mutations described here) confer unprecedented levels of 
protease solubility. 

The first a-helix of HCV NS3, known as a-helix 0, has an extremely 
hydrophobic solvent-exposed surface (Yan et al, 1998), and applicants believed that 
this could be a contributor to the insoluble character of preparations of wild type HCV 

20 NS3 protease (such as SEQ ID NO: 1 in Figure 9) and unmodified HCV NS4a-NS3 
fusion protein (such as SEQ ID NO: 3 in Figure 10). Therefore, the targeted/semi- 
random mutagenesis method was applied in an effort to change the hydrophobic 
solvent-exposed residues of a-helix 0 to more hydrophilic residue types (see 
Examples 3 and 4). Currently, all known strains of HCV have five hydrophobic 

25 solvent-exposed residues in a-helix 0 (L13, L u , hi, lis, L 2 i), and, according to this 
invention, these could be changed alone or in any combination. Applicants chose to 
mutate all five in tandem. Changing other solvent-exposed hydrophobic residues in 
a-helix 0 in strains currently unknown is also encompassed by this invention, as is 
changing hydrophobic residues in a-helix 0 that are not solvent-exposed. Although 

30 applicants demonstrate changes to the a-helix 0 of a HCV NS4a-NS3 fusion protease, 
the unfused HCV NS3 protease could also undergo such changes and a such modified 
HCV NS3 protease is encompassed by this invention. 
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A simple method of isolating a soluble modified form of HCV NS3 protease 
from a large library of candidate mutants is to simply screen for the presence of 
modified HCV NS4a-NS3 fusion protein variants in the cleared supernatants of 
induced transformants by SDS-PAGE analysis, similar to the analysis depicted in 
5 Figure 4. This method is laborious and inherently low-throughput. Another, much 
more powerful high-throughput method is to have a system where the most soluble 
modified forms of HCV NS3 protease are selected from among all the other clones in 
the library. Applicants have used such a selection scheme to select for more soluble 
modified forms of HCV NS3 protease. (See Example 4 and Figure 3.) What 
10 applicants describe is a specific application of a general method for selecting for 
soluble (or active) variants of proteases. This scheme can be used to screen for 
solubility or activity of any of the modified forms of HCV NS3 protease encompassed 
by this invention. 

The modified HCV NS4a-NS3 fusion proteases having a-helix 0 mutations 

15 only (described in Example 4) retained full enzymatic activity and were considerably 
more soluble than the non-mutated NS4a-NS3 fusion (see Figure 4, compare lanes 3 
and 5). These mutants are useful for experiments where the protein concentration can 
be kept relatively low, such as enzyme assays. Also, these mutants allow the bacterial 
selection system (Figure 3) to be used as a screening system for HCV protease 

20 inhibitors. (Inhibitors will cause a decrease in the growth of these induced bacteria in 
chloramphenicol media.) Similarly, the modified HCV NS3 proteases that are not 
fusions could be used in enzyme assays and to screen for inhibitors. However, the 
modified HCV NS4a-NS3 fusion proteases having only a-helix 0 mutations still had a 
tendency to aggregate at the high protein concentrations required for protein NMR 

25 (see Figure 5, panel A). One concern was that the NS4a and NS3 segments were not 
fused optimally. Therefore, work was done to optimize the linker sequence 
connecting the NS4a and NS3 segments. 

Many different methods can be used to optimize a linker sequence. In general, 
the goal is to successfully connect two protein segments using a polypeptide linker so 

30 that the relative spatial positioning of the two segments is maintained with minimal 
stress and perturbation to the structure. Another important aspect is that the linker 
should ideally not be highly flexible, as this might tend to destabilize the protein. In 
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this case, because the desired relative structural positioning of the NS4a and NS3 
segments was known, a structure-based approach was taken to find an optimal linking 
sequence. As described in Example 5, the structural information in the Brookhaven 
Protein Data Base (PDB) was mined to find short turn sequences that successfully 

5 linked two beta-strands that were structurally similar to the NS4a (residues) and NS3 
(residues) segments as seen in the published X-ray structure of the HCV NS3 protease 
/ NS4a peptide complex. Coordinates from Protein Database file 1 JXP.pdb (Yan et 
al., Protein Science 7, 837-847 [1998]) were used for this purpose. In addition, the 
sequential proximity of a surface-exposed hydrophobic residue within the NS4a 

10 segment (Val 30 ) provided the opportunity to test the effect of mutation of this residue 
while trying the different linker sequences. 

One optimal linker sequence among those tested conferred high levels of 
expression and solubility. This optimized linker sequence in combination with the 
wild-type a-helix 0 (SEQ ID NO: 24 in Figure 18) does not confer solubility to the 

15 protease while the optimized turn in combination with a modified a-helix 0 (SEQ ID 
NO: 14 in Figure 13) does (Figure 4, compare lanes 7 & 9). Overall, the quality of 
the NMR spectra obtained with the modified a-helix 0 / optimized linker form of the 
protease (SEQ ID NO: 14 - Figure 5, panel B) was superior to that obtained with the 
modified a-helix 0 / non-optimized linker form of the protease (SEQ ID NO: 12 - 

20 Figure 5, panel A), indicating a more soluble and less-aggregating form of the 
protease. 

The effect of additional solvent-exposed hydrophobic-to-hydrophilic amino- 
acid substitutions was explored by changing certain surface residues in SEQ ID 
NO: 14, resulting in SEQ ID NO: 16 (see Example 6). The residues were chosen by 

25 aligning HCV NS3 protease sequences from published sequences of wild type HCV 
isolates and looking for naturally-occurring hydrophobic-to-hydrophilic amino-acid 
substitutions at residue positions that are solvent exposed in published NS3 protease 
structures. The combination of three of these naturally-occurring substitutions (A40T, 
I72T, P86Q) were found to have a solubility enhancing effect (SEQ ID NO: 16 in 

30 Figure 14). In addition, three of the four non-zinc binding cysteine residues were 

targeted for amino-acid substitutions (SEQ ID NO: 18 in Figure 15). Other sequences 
generated by this process can also yield useful results. Two variants of the Helix0-7 
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sequence variant (see Figure 1 1 - SEQ ID NO. 9) was substituted for the HelixO-1 
sequence (see Figure 1 1 - SEQ ID NO. 6) to generate SEQ ID NOS: 20 and 22. These 
modified HCV NS4a-NS3 fusion protease variants were also highly soluble. 

In order to demonstrate the usefulness of the resulting protease variants for 

5 structural studies of the protease and protease: inhibitor complexes, the modified HCV 
NS3 protease encoded by SEQ ID NO: 18 was characterized by multi-dimensional 
NMR spectroscopy (Examples 8 and 9) and X-ray crystallography (Example 10). 

In Example 8, NMR spectroscopy of the apo-form of the protease was used to 
generate sequential backbone assignments and sidechain NMR assignments. These 

10 assignments include the catalytic triad residues His 57 , Aspsi, and Seri 39 and the 

residues spatially near to them, indicating that spectra of the apo-form of the protein is 
a useful tool for analyzing of the effect of addition of potential NS3 protease 
inhibitors using chemical shift perturbation mapping. As shown in Example 9, 
addition of a published peptidic HCV protease inhibitor causes many chemical shift 

15 perturbations for residues in the *H- 15 N HSQC NMR spectrum of the protease, 

including the active site residues. This is the first publication of high quality NMR 
spectra of HCV NS3 protease and of a HCV NS3 protease: inhibitor complex. As 
demonstrated here, the modified protease yields high quality NMR spectra that can be 
used to identify compounds that bind to the protease. One can use this chemical-shift 

20 mapping technique to identify novel compounds that bind to the protease by 

collecting a series of spectra in which different compounds have been mixed with the 
protease. 

Example 10 demonstrates that the modified HCV NS3 protease encoded by 
SEQ ID NO: 18 can be co-crystallized with a published HCV protease inhibitor 

25 overnight, resulting in high quality crystals that diffract to 2 A. This is the first 
publication of a HCV NS3 protease inhibitor complex solved by X-ray 
crystallography and demonstrates that proteins produced by application of this 
invention can rapidly produce high quality co-crystals with protease inhibitors. In 
addition, the high resolution structure verifies that the NS4a-NS3 fusion construct 

30 produces the relative structural positioning of the NS4a-NS3 polypeptide segments as 
designed. 
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The nucleic acid molecules of the present invention can be made by one of 
ordinary skill in the art using standard knowledge of codon usage and molecular 
biology techniques that can be found in, for example, "Molecular Cloning, A 
Laboratory Manual" (2 nd edition, Sambrook, Fritch and Maniatis 1989, Cold Spring 
5 Harbor Press). 

The vectors of the present invention which comprise nucleic acids of the 
present invention can be made using any suitable vector as determined by one of 
ordinary skill in the art. Such vectors include, but are not limited to, vectors such as 
pBR322 and expression vectors such as pET series (Novagen). The vectors of the 
10 present invention can be produced using standard molecular biology techniques as 
found in, for example, "Molecular Cloning, A Laboratory Manual" (2 nd edition, 
Sambrook, Fritch and Maniatis 1989, Cold Spring Harbor Press). 

Any suitable host cell can be used, such as bacterium, insect, plant, mammal 
or other. Conditions for expression and recovery of the proteins can be determined by 
15 one of ordinary skill in the art using techniques found in, for example, Protein 

Expression in Mammalian and Insect Cell Systems, S. Geisse and H.P. Kocher in 
Methods in Enzvmology , Vol. 306 (1999), p. 19-42. 

All references cited in this specification are incorporated herein by reference. 

20 Examples 

The following Examples explain how to make and use certain embodiments of 
the invention. From these Examples, the Detailed Description of the Invention, and 
the references cited therein, one of ordinary skill in the art can readily discern how to 
make these and other embodiments of the invention. The Examples are not meant to 

25 limit the scope of the invention; the scope of the invention is delineated by the claims. 
In the following Examples, the standard residue numbering for HCV NS3 
protease is used (as outlined in the Definitions). In cases where sequences are added 
to the N-terminus, the NS3 numbering remains the same, sometimes resulting in 
negative numbering for the additional N-terminal residues. All presented protein 

30 sequences are aligned in Figure 6. 
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Example 1 - Parental HCV protease DNA sequence 

The HCV NS3-encoding DNA used as a basis for all the subsequent 
modifications is a synthetic gene coding for the HCV protease (residues 1-181) shown 
in SEQ ID NO: 2 (Figure 9). Residues 1-181 comprise the portion of the HCV NS3 
5 gene product that exhibits protease activity. Longer fragments of the HCV NS3 
protein could be used. The synthetic gene was constructed so that all codons were 
optimized for high level expression in E.coli. The protein-coding sequence of this 
construct is shown in SEQ ID NO: 1 (Figure 9). This HCV protease protein is 
produced at a high level when expressed from vector pET24a (Novagen) in E.coli 
10 strain BL21(DE3) (Novagen), but upon fractionation of the extract the protease is in 
the insoluble fraction (data not shown). 

Example 2 - Fusion of wild type NS4a to parental NS3 with a linker 

A plasmid was constructed that encoded the following portion of the full- 
15 length HCV NS4a sequence: NS4a residues 2 1 -3 1 ; G 2 i S 22 V 23 V 24 I 25 V 26 G 27 R 28 I 2 9 
V 30 L 3 i (SEQ ID NO: 26). This portion of the NS4a sequence was fused to the 
amino-terminus of the HCV NS3 protease sequence (NS3 HCV protease sequence 5- 
183). The fusion was constructed so that the NS4a segment was fused to the NS3 
segment by means of a linker (Asn Gly, aka NG), yielding the protein sequence 
20 . . . GS V VrVGRIVLNG A Y AQQ ... at the NS4a-NS3 fusion (see Seq ID NO:3 in 
Figure 10). 

The expression plasmid for the NS4a-linker-NS3 fusion protease was 
constructed by a three-way ligation of the following three DNA preparations: 

1) The vector for the expression plasmid was a modified form of pET28a (Novagen), 
25 where pET28a plasmid DNA had been double-digested with Xhol and Sail, and 

subsequently ligated, destroying both sites in the vector. The resulting modified 
vector (mpET28a) was double digested with Ndel and EcoRI. 

2) Two synthetic 5'-phosphorylated oligonucleotides (coding for the NS4a and linker 
segments) were annealed, creating Ndel and Xhol sticky ends. 

30 5'- 

TATGAAAAAAAAAGGATCCGTTGTTATCGTCGGCCGTATAGTACTGAACGGTGCTTACGC 
TCGCAGAC-3' 
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TCGAGTCTGCTGAGCGTAAGCACCGTTCAGTACTATACGGCCGACGATAACAACGGATCC 

5 3) The NS3-coding DNA from Seq ID NO: 1 was PCR amplified with the following 
oligonucleotides which created a silent mutation encoding a Xhol site. The 
resulting PCR fragment was digested with Xhol and EcoRI. 

5 ' -C AGCAG ACTCGAGGTCTGC-3 ' 

10 

5 ' -GCACG AATTCACGGGGAACGCATGG-3 ' 



The plasmid product of this three-way ligation codes for a NS4a-NS3 fusion 
15 protein (see Seq ID NO:3 in Figure 10; SEQ ID NO: 3). This fusion protein is 
produced at a high level when expressed in E.coli, but upon fractionation of the 
extract the fusion protein is in the insoluble fraction (see Figure 4, lanes 2 & 3). 

Example 3 - Generation of a large library of modified HCV NS4a-NS3 fusion 
20 proteases in which the hydrophobic solvent-exposed residues of oc-helix 0 are 
replaced with hydrophilic residues 

The HCV NS3 protease sequence generated in Example 2 (Seq ID NO: 3) was 
PCR amplified and moved into another expression vector as a Ncol-Sall fragment. 
This modified expression vector (depicted in Figure 3) is derived from pET21 

25 (Novagen) and includes a modified Tet repressor in which a HCV NS3 protease site 
has been inserted. The vector was cut with Ncol and Xhol, and ligation with the 
Ncol-Sall fragment resulted in the destruction of the Xhol site within the pET21- 
derived multiple cloning site in this plasmid vector. 

The five hydrophobic solvent-exposed residues in oc-helix 0 (Li 3 , Li 4 , In, Ii 8 , 

30 L21) were singled out for targeted semi-random mutagenesis using the biased codon 
method (Kamtekar et al., Science 262, pl680-1685 [1993]). In this method, a "VAV" 
codon (V = G, C, or A) encodes a mixture of six possible codons all coding for 
hydrophilic residue types (His, Glu, Gin, Asp, Asn, or Lys). 
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The targeted semi-random mutagenesis was carried out by using this 
oligonucleotide sequence as the 5' primer for PCR amplification of the HCV NS3 
protease sequence: 

5 Xhol 

Gin Thr Arg Gly *** *** Gly Cys *** *** Thr Ser *** Thr Gly Arg Asp 
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
CAG ACT CGA GGT VAV VAV GGT TGC VAV VAV ACC TCC VAV ACC GGT CGT GAC 



10 where V = (G or C or A) 



A 3' primer was used to prime at a site downstream of the NS3 protease stop 
codon and EcoRI site. The PCR products were digested with Xhol and EcoRI and 
ligated into the XhoI-EcoRI cleaved HCV NS4a-NS3 fusion protease expression 
15 vector described at the beginning of this Example. 

In this way, a library of 7776 (= 6 5 ) potential unique modified HCV NS4a- 
NS3 fusion proteases with different mutant hydrophilic a-helix 0 sequences was 
generated. Portions of the ligation mixture were electroporated into E.coli strain 
MC1061 (ATCC 5338). Over 50,000 transformants were pooled and plasmid DNA 
20 was isolated. 

The plasmid used as a vector for the construction of this library is diagrammed 
in Figure 3 and is relevant to the mutant selection experiments described in Example 
4. When another selection or screening method is used (such as screening by SDS 
PAGE analysis of homogenates of induced transformants [as in Figure 4]), other 
25 plasmid vectors would be suitable. 

Example 4 - 

Bacterial Selection of Soluble modified HCV NS4a-NS3 Fusion Proteases 

The bacterial selection system is diagramed in Figure 3. The system features a 
30 plasmid encoding a mutagenized HCV protease gene (in this case, the modified HCV 
NS4a-NS3 fusion proteases with the a-helix 0 mutations) as well as a gene encoding 
a modified Tet repressor. The modification of the Tet repressor is the introduction of 
a HCV NS3 protease cleavage site within a solvent-exposed loop of the Tet repressor 
protein. The strain carrying this plasmid also has a chromosomally-encoded 
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chloramphenicol acetylase transferase gene (CAT - conferring chloramphenicol 
resistance) under the control of the Tet promoter. After induction of the modified 
HCV NS4a-NS3 fusion protease by IPTG (under lac-T7 control), the strain becomes 
either chloramphenicol resistant (Cm R ) if the protease activity is present (cleavage of 
the modified Tet repressor allowing expression of CAT) or remains chloramphenicol 
sensitive (Cm s ) if the fusion protease activity is not present (no cleavage of the 
modified Tet repressor and therefore there is repression of CAT). 

In this case, the expressed wildtype HCV protease is inherently active, 
however the activity is masked by the insoluble character of the expressed protease, 
resulting in a Cm s phenotype. If among the pool of mutagenized transformants a 
more soluble active mutant protease is expressed, the inherent activity of the protease 
is unmasked by the soluble nature of the mutant enzyme and the cells harboring this 
mutant protease become Cm R . 

Expression of the parental HCV NS4a-NS3 fusion protease (SEQ ID NO: 3) in 
this section system system yielded E.coli cells that grew only very slowly on agar 
plates with lug/ml chloramphenicol. The library of a-helix 0 mutants (described in 
Example 3) was transformed into the E.coli selection strain. Many plasmids encoding 
a-helix 0 mutagenized modified HCV NS4a-NS3 fusion proteases conferred upon 
induced transformed E.coli cells an enhanced ability to grow on plates with low levels 
of chloramphenicol (1-3 |ug/ml chloramphenicol). However, twelve transformants 
grew on plates with very high levels of chloramphenicol (30 ug/ml). These highly 
Cm R transformants were colony purified and solubilities of the expressed mutant 
HCV NS4a-NS3 fusion proteases were evaluated by SDS-PAGE analysis (similar to 
that depicted in Figure 4). Six of the transformants exhibited more soluble proteases 
than the others and plasmid DNA was prepared and sequenced. The relevant portions 
(through the NS3 a-helix 0 segment) of the sequences from these isolates are listed as 
SEQ ID Nos: 6-1 1 (see Figure 1 1). 

As shown in Figure 4 (lanes 4 & 5), expression of a modified HCV NS4a-NS3 
fusion protease with the Helix0-1 sequence (see SEQ ID NO: 6 and SEQ ID NO: 12) 
produced a protein that was in the soluble fraction while fusion protein with the wild- 
type a-helix 0 sequence was insoluble (lanes 2 & 3). Similar enhancements in 
solubility were obtained with the other five sequenced variants. 
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E. coli cells containing a similar expression system have been deposited with 
the ATCC and have ATCC accession number 207047. The cells submitted to the 
ATCC differ from the cells used here in that the ATCC cells have a plasmid 
' containing a CMV protease and the CMV protease cleavage site within the Tet 
5 repressor, and in that the CAT gene is on a second plasmid, rather than on the 

chromosome. One of ordinary skill in the art could make a cells useful for the present 
invention from the cells deposited with the ATCC by replacement of the CMV 
protease sequences with HCV protease sequences and changing the CMV cleavage 
site coded within the Tet repressor gene to a HCV protease cleavage site. Cells useful 
10 for the present invention could have the CAT gene on a second compatible plasmid 
rather than on the chromosome. 

The cells having ATCC accession number 207047 and the bacterial selection 
system referred to herein are further described in application U.S. Serial No. 

60/1 15,270, filed on January 8, 1999, and application U.S. Serial No. , 

15 filed on even date herewith, both of which are incorporated herein by reference. 

Example 5 -Linker optimization in a modified HCV NS4a-NS3 fusion protease, 
including change in NS4a 

Structural information from the Protein Data Bank was used to identify 
20 structurally characterized proteins that have two fi-strands (structurally homologous to 
residues NS4a residues G27-R28-I29-V30-L31 and NS3 residues A5-Y 6 -A7-Q 8 -Q 9 ) linked 
by a tight turn ( Searchloop function in the Insight II program, Molecular Simulations 
Inc.). Three different turn types were identified (as exemplified by the Brookhaven 
Protein DataBase (PDB) files IOPB [residues 45-48], 1LID residues 109-112], and 
25 1EUR [residues 177-184]) and three sets of degenerate double stranded 
oligonucleotides were synthesized that coded for 8 to 12 variants of each of the three 
turn types. 

Turn oligo #1: 

Asn Asn Ser 

30 GlyArgllelleLeuSerGlyAspThrAlaTyrAlaGlnGlnThr 
GGCCGTATCAWCCTGTCCGGTRACACCGCTTACKCTCAGCAGAC 

CATAGTWGGACAGGCCAYTGTGGCGAATGMGAGTCGTCTGAGCT 
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Turn oligo #2: 

Asn Asn Ser 

GlyArgllelleLeuSerAspGlyThrAlaTyrAlaGlnGlnThr 
GGCCGTATCAWCCTGTCCRACGGTACCGCTTACKCTCAGCAGAC 

CATAGTWGGACAGGYTGCCATGGCGAATGMGAGTCGTCTGAGCT 

Turn oligo #3: 

Asn Ser 
GlyArgllelleLeuSerAspGlyGlylleThrAlaTyrAlaGlnGlnThr 
GGCCGTATCAWCCTGTCCGACGGTGGTATCACCGCTTACKCTCAGCAGAC 

CATAGTWGGACAGGCTGCCACCATAGTGGCGAATGMGAGTCGTCTGAGCT 

Where W = (A,T); K = (G,T); M = (A,C); R = (A,G); Y = (C,T). 

These variants (20 possible sequence possibilities in all) were incorporated 
into an optimized construct from Example 4 by subcloning the three oligos separately 
using the EagI and Xhol sites in SEQ ID NO: 13. 

In addition to the linker sequence, a single solvent-exposed residue within the 
NS4a sequence (Val 30 ) was allowed to be either isoleucine or asparagine in this series 
of linker variants. This residue position is always a hydrophobic residue (usually Val 
or lie) in wild-type isolates of HCV. Therefore, an Asn substitution at this position 
would be a non-naturally occurring substitution. In contrast, the residue 
corresponding to NS3 residue 7 (Ala 7 ) was allowed to be either alanine or serine, but 
both of these residue-types are present at this position in different wild-type isolates 
of HCV. It was hypothesized that serine at this solvent-exposed position might confer 
more solubility because it is more hydrophilic than alanine. 

Protease expression levels and solubilities of randomly picked linker variant 
modified HCV NS4a-NS3 fusion proteases were monitored by SDS-PAGE analysis 
of the soluble fractions of induced cell lysates (same procedure as outlined in the 
legend to Figure 4). One linker variant (resulting from incorporation of a variant of 
the Turn oligo #1) was clearly better than the rest in terms of both protein expression 
levels and solubility. This linker sequence (. . .G. 4 R.3 L 2 N_i L 0 Si G 2 D 3 T 4 A 5 Y 6 A 7 Q 8 
Q9 T10...) and the resulting modified HCV NS4a-NS3 fusion protease is shown in 

-27- 



DB17 (NP) 



SEQ ID NO: 14. It incorporates the asparagine mutation within the NS4a segment 
(numbered -1 in this fusion construct and numbered 30 in the NS4a sequence) and 
retains the alanine at NS3 position 7. 

Figure 6 shows an alignment of the protein sequences of SEQ ID Nos: 1, 3, 
5 12, 14, 16, 18, 20, 22 and 24. As seen in Figure 6, the linker segment of SEQ ID NO: 
14 is two residues longer relative to the original linker used in SEQ ID Nos: 3 and 12. 

The optimized turn sequence by itself is not sufficient to confer solubility on a 
NS4a-NS3 fusion. This can be seen in lanes 6 & 7 of Figure 4, where the optimized 
turn in combination with the wild-type oc-helix 0 (SEQ ID NO: 24), does not confer 
10 high solubility. Only the presence of the cc-helixO mutations, either with or without 
the optimized turn sequence (see Figure 4 lanes 4 & 5 and lanes 8 & 9, respectively), 
allows high levels of fusion protease in the supernatant fractions. 

However, the experiment shown in Figure 4 only shows whether the expressed 
protein fractionates in the soluble fraction. A more detailed analysis of the construct' s 

15 suitability for structural studies was preformed using NMR, where the protein 
aggregation state could be assessed. ^-^N HSQC NMR spectra were collected for 
15 N-labeled proteins of SEQ ID NO: 12 and SEQ ED NO: 14. As shown in Figure 5, 
the protein of SEQ ID NO: 14 (panel B - mutant a-helix 0 with optimized linker) 
generates a higher quality HSQC spectrum (as judged by NMR linewidth and 

20 resolution of 2D peaks) than the protein produced from SEQ ED NO: 12 (panel A - 
mutant a-helix 0 with non-optimized linker). This analysis shows that the optimized 
linker derived in Experiment 5 is contributing favorably to the overall solubility and 
non-aggregation of the modified HCV NS4a-NS3 fusion protease. 

25 Example 6 - Incorporation in modified HCV NS4a-NS3 fusion proteases of 

naturally-occurring hydrophobic-to-hydrophilic residue substitutions and 
change of non-zinc-binding cysteines to non-cysteine amino acids 

Inspection of a protein sequence alignment of the HCV NS3 portions of 
different published HCV isolates (not shown) showed that many residue positions are 
30 variable among the different isolates. In particular, a number of solvent-exposed 

surface residue positions can take on a number of different naturally-occurring amino- 
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acid residue types that differ in their hydrophobic-hydrophilic character. These 
residues include Ala 39 (sometimes Ser), Ala4o (sometimes Thr), Pro 6 7 (sometimes 
Ser), He 72 (sometimes Thr), and Pro 8 6 (sometimes Gin). 

Substitution of non-essential cysteine residues is a method sometimes utilized 

5 in attempts to improve a protein's biochemical properties by reducing the tendancy of 
the protein to form disulfide-linked multimers upon oxidation (for example, 
Yamazaki et al., Protein Science [1996] 5, 495-506). Preliminary experiments 
showed that substitution of each of the four non-zinc-binding cysteines within HCV 
NS3 protease had minimal effects upon enzymatic activity (data not shown). 

10 Inspection of the protein sequence alignment showed that, within the a given 

sequence, two of the non-zinc-binding cysteine residues (Cys 4 7 and Cys 52 ) either 
appeared together as a cysteine pair or were both changed to non-cysteine residue 
types (coupled with their spatial arrangement within the HCV NS3 protease sequence, 
this fact indicates that this pair is likely to form a disulfide linkage). A third non-zinc- 

15 binding cysteine residue (Cysi 5 9), was also targeted for mutagenesis (although this 
position is invariant among the different HCV isolates inspected). 

Using this information, six oligonucleotides were designed that code for 
alternate residue types at selected solvent-exposed residue positions, with a bias 
toward substitition of hydrophilic residue-types in place of hydrophobic ones. In 

20 addition, three of the four non-zinc-binding cysteine residues (Cys 47 , Cys52, and 
Cysi5s>) were changed to non-cysteine residues. 

Surface Qligo#l (P86Q): 

5'-ACGGGAACCCTGCGGAGCTGCCAACCAACCAGGTCTTTG-3' 

25 

Surface Qligo#2 (P67P/S; I72Q): 
5'- 

CAACGTTGGTGTACATCTGGGTAACCGGACCTTTCGRGGAAGCGATGGTA 
CGGGT-3' 

30 

Surface Qligo#3 (A39A/S; A40T); 

5 ' -CC AGGAAGGTCTGGGTAGMGGTGG A A ACG ATCTG A AC-3 ' 
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Cvs Qligo#l (C159S/T); 

5'-CTTTAGCAACACCACGGGTGGWAACAGCAGCACGGAAGAT-3' 

5 Cvs Qligo#2 (C47S/T: C52L/M); 
5'- 

ACCGTGGTAAACGGTCCACAKAACACCGTTGATGGWGGTAGCCAGGAAG 
GTC-3' 

10 Cvs Oligo#3 rA39A/S; A40T: C47S/T; C52L/M); 
5'- 

ACCGTGGTAAACGGTCCACAKAACACCGTTGATGGWGGTAGCCAGGAAG 
GTCTGGGTAGMGGTGGAAACGATCTGAAC-3 

15 Where W = (A,T); K = (G,T); M = (A,C); R = (A,G). 

The site-directed mutations were introduced into SEQ ID NO: 13 by the dut- 
ung method (Kunkle, 1985) using the Muta-Gene Phagemid kit (BioRad). The 
mutations were generated by using the six oligonucleotides in different combinations 

20 to produce distinct sets of clones having different mutation combinations. Expression 
levels and solubilities of the randomly-picked mutant NS4a-NS3 fusion protease 
variants were monitored by SDS-PAGE analysis of the soluble fractions of induced 
cell lysates (similar to the analysis shown in Figure 4) and some of the clones 
exhibiting enhanced protease solubilty were sequenced. The A40T, I72T, P86Q 

25 mutations were found to have the most pronounced positive effect on solubility. 

To combine these mutations with the optimized linker sequence generated in 
Example 5, DNA sequences encoding the A40T, 17 2T, P86Q surface mutants were 
subcloned into SEQ ID NO: 15 without and with the C47S, C52L, and C159S 
cysteine mutants to produce the proteins presented in SEQ ID NO: 16 (Figure 14) and 

30 SEQ ID NO: 18 (see Figure 15), respectively. The 2D ^-^N HSQC NMR spectrum 
of protein produced from SEQ ID NO: 18 shown in Figure 5 panel C clearly indicates 
that this protease variant is highly non-aggregating and suitable for high-resolution 
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NMR structural analysis. E. coli cells harboring the plasmid containing SEQ ID NO: 

19, which is the DNA sequence encoding SEQ ID NO: 18, have been deposited with 
the ATCC and have ATCC accession number 207040. 

Additional modified HCV NS4a-NS3 fusion proteases analogous to the one 
5 shown in SEQ ID NO: 18 were constructed in which another a-helix 0 variant 

sequence identified in Example 4 was substituted. Variant a-Helix 0 -7, identified as 
SEQ ID NO: 9 in Figure 11, was substituted, resulting in SEQ ID NO: 20 (see Figure 
16) and SEQ ID NO: 22 (see Figure 17). SEQ ID NO: 22 is the same as SEQ ID NO: 

20, except that an additional naturally-occurring amino-acid substitution (C16T) was 
10 included. When expressed, both of these modified HCV NS4a-NS3 fusion proteases 

had solubilities comparable to that of SEQ ID NO: 18 and, as expected, exhibited 
chromatographic properties on ion exchange media that differed somewhat from the 
SEQ ID NO: 18 homolog. 2D ^-"N HSQC spectra confirmed that these protein 
variants were similarly folded to that of the SEQ ID NO: 18 protease (data not 
15 shown). E. coli cells harboring the plasmid containing SEQ ID NO: 23, which is the 
DNA sequence encoding SEQ ID NO: 22, have been deposited with the ATCC and 
have ATCC accession number 207041. 

Example 7 - Expression and Purification in E. coli 

20 All constructs were expressed in E. coli strain BL21(DE3) (Novagen) using 

one of the pET plasmid vectors (Novogen). Proteins were expressed either as 
polyhistidine-tagged proteins using the pET28a vector, or as non-tagged proteins 
using the pET29a vector. Probably due to optimized bacterial codon usage and 
massive overproduction, expression of these constructs resulted in translational 

25 readthrough protein products (~ 10-20%), in addition to the predicted full-length 

protein product. Modification of the expression vectors to include a triple-stop set of 
codons (TAA TAA TGA) results in the elimination of the readthrough products (data 
not shown). 

The following two purification methods are outlined for the modified HCV 
30 NS4a-NS3 fusion protease produced from expression of SEQ ID NO: 19 in the 

pET29a vector system (no tag). However, one skilled in the art could readily modify 
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the procedures slightly to purify any of the modified forms of HCV NS3 protease of 
the present invention. 

Method 1 

5 Expression of non-tagged variant expressed from SEQ ID NO: 19 was carried 

out in minimal bacterial growth media with induction with 0.3 mM IPTG when the 
cell density reached OD600 = 1 .0. Concurrent with induction of the culture, ZnCl 2 
was added to final 30 uM concentration and the cells were transferred to 20 degrees C 
for 20 hours. 

10 After centrifugation, the cell pellet was resuspended in 25mM Na-phosphate 

buffer pH 7.5, 0.5M NaCl, 2mM DTT, 10 uM ZnCl 2 and cells were disrupted by 
passage through a High Pressure Homogenizer (RANNI model 8.30H). The 
homogenate was clarified at 15,000 (Sorvall model SS34 rotor) rpm for 30 min and 
10 mM MgC12, and 20ug/ml DNAse/RNAse were added to the supernatant. 

15 After incubation at room temperature for 10 minutes, the supernatant was 

diluted twice with 25mM Na-phosphate buffer, 2mM DTT, lOuM ZnCl 2 and applied 
onto Macro-Prep S column (Bio-Rad ,1kg of resin) equilibrated with 25mM Na- 
phosphate pH 7.5, 0.2M NaCl, 2mM DTT, lOuM ZnCl 2 . After washing the column 
till OD280 -0.1, the bound protein was eluted with the same buffer with 0.5 M NaCl. 

20 The eluate was concentration on an Amicon YM5 membrane and applied onto 

a Superdex 30 26/60 column equilibrated with 25mM Na-phosphate pH-7.5, 0.2M 
NaCl, 2mM DTT, lOuM ZnCl 2 . The fractions of the NS3 peak were applied onto SP 
Sepharose 26/10. Buffer A was the same buffer as for the Superdex 30 column. 
Buffer B is the same buffer with 1M NaCl. The protein peak elutes at 0.5-0.6 M 

25 NaCl. 

For crystallization, the purified protein was exchanged into 0.5 M NaCl, 25 
mM MES (2-(N-Morpholino)ethanesulfonic Acid), pH 6.5, 10% (v/v) glycerol, 2 mM 
dithiothreitol (DTT) and could be concentrated to 5 mM (~ 100 mg/ml). For NMR 
spectroscopy, the protein was exchanged into 25 mM sodium phosphate pH 6.5, 50 
30 mM sodium sulfate, 2 mM deuterated DTT, and 10% D 2 0 and could be concentrated 
to at least 3 mM. 
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Integrity of the preparation was verified by mass spec analysis. Using this 
purification method, the final yields are typically 50-65 mg pure protein per liter 
culture. 
Method 2 

5 For protein for crystallography, the following modified protocol was found to 

produce a preparation that crystallized readily (overnight): 

After cell disruption in the homogenizer (as in Method 1), the homogenate 
was centrifuged (Sorvall model SS34) for 30 min at 16,000 rpm. The supernatant was 
treated with PEI (polyethylenimine - 0.2% final) for 20 min at room temperature upon 

10 stirring. The white solution was centrifuged at 16,000rpm for 20min and supernatant 
was precipitated with ammonium sulfate (40%) at 4 degrees C for 30 min. The 
solution was centrifuged at 10,000 rpm for 30 min. The pellet was resuspended in 
25mM Na-phosphate, pH 7.5, 2mM DTT, 10:M ZnCl 2 (10 ml per liter of culture) and 
centrifuged again at 16,000 rpm for 10 min. The supernatant was applied first onto 

15 Superdex 75 26/60 column equilibrated with 25mM Na -Phosphate buffer, pH 7.5, 
0.2M NaCl, 2mM DTT, 10:M ZnCl 2 . The peak fractions were applied then to an SP 
Sepharose 26/10 column. Buffers A and B are the same as in Method 1. 

After concentration of peak functions on Amicon membrane, the protein was 
applied onto Superdex 30 16/60 equilibrated with 25mM Na-phosphate, pH 7.5, 2mM 

20 DTT, 10:M ZnCl 2 and no salt. Only the purest side fractions of HCV NS3 protease 
were collected and pooled. 

After concentration (Millipore Ultrafree-5K cutoff), the protein was 
exchanged into 25mM MES buffer, pH 6.5, 0.5M NaCl, 2mM DTT, 10:M ZnCl 2 . 
The protein preparation concentrated easily and readily produced crystals (as outlined 

25 in Example 10) even after four months of storage at 4 degrees C. 

Example 8 - NMR spectroscopy of modified HCV NS4a-NS3 fusion proteases 

Modified HCV NS4a-NS3 fusion proteases were prepared for NMR analysis 
by exchanging the purified protein (see Method 1, Example 7) into NMR buffer (25 
30 mM sodium phosphate pH 6.5, 50 mM sodium sulfate, 2 mM deuterated DTT, and 
10% D20). Protease samples both with and without readthrough product (see 
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Example 7) were successfully used for NMR spectroscopy in both Examples 8 and 9. 
Sample concentrations ranged from 0.2 mM to 3 mM. 

Two-dimensional *H- 15 N HSQC NMR spectra were obtained using a 
WATERGATE HSQC pulse sequence (Mori et al., (1995), V. Magn. Reson. B108, 94 
- 98; Sklenar, (1995) J. Magn. Res. A114, 132-135) on a Varian UNITY PLUS 600 
MHz NMR spectrometer. The data were collected at 30 degrees C with 4 transients 
per FID and either 128 or 256 increments, with spectral widths of 10.0 and 2.4 kHz in 
F 2 (*H) and Fi( 15 N), respectively. 

A 1.5 mM solutions of a double-labeled ( 13 C- 15 N) preparation of apo-HCV 
protease (SEQ ID NO: 18) was prepared. A full set of NMR spectra were collected 
and used to determine the backbone NMR resonances of the apo-HCV protease. The 
3D NMR experiments included HNCO, HNCACO, HNCACB, CBCACONH, 
HBHACONH, HNCAHA, HCCH-TOCSY, 15 N-edited NOESY and 13 C-edited 
NOESY (see Clore and Gronenborn, Meth. Enzymol. 239, 349-363 (1994) for 
references to these experiments). 

Backbone NMR resonances for 155 of the 187 non-proline residues and 8 of 
the 1 1 proline residues were obtained along with most of the sidechain assigments. 
These assignments include the catalytic triad residues His 5 7, Asp 8 i, and Seri 39 and the 
residues spatially near to them, indicating that the apo-form of the protein is a good 
reagent for NMR analysis of proteasednhibitor complexes 

Example 9 - NMR of a complex of a modified HCV NS4a-NS3 fusion protease 
with an inhibitor 

A complex between a 15 N-labeled HCV NS4a-NS3 fusion protease (SEQ ID 
NO: 18) and an inhibitor peptide (Ac-Asp-D-Glu-Leu-Ile-Cha-Cys-OH) (higallinella 
et al., Biochemistry 37, 8906-8914 (1998)) was formed by forming a 1: 1 complex of 
the two components (at 200 uM concentration) in the NMR buffer described in 
Example 8. A 2D HSQC spectrum was collected. The HSQC spectra of the apo- and 
peptide-complexed soluble modified HCV NS4a-NS3 fusion protease (SEQ ID NO: 
18) are overlayed in Figure 7. Many residues undergo chemical shift perturbations 
upon addition of the inhibitor, including the active site residues. 
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Example 10 - Crystal Structure of a complex of modified HCV NS4a-NS3 fusion 
protease with an inhibitor 

The modified HCV NS4a-NS3 fusion protease produced by Method 2 in 
5 Example 7 is well suited to support X-ray crystallographic studies. In this Example, 
the protein preparations included 10-20% translational readthrough product (see 
Example 7). The preparations produced crystals of a complex with inhibitor 
overnight and a structure of the complex with inhibitor to 2. 1 A resolution. 

Crystals of the soluble modified HCV NS4a-NS3 fusion protease complexed 
10 with peptidic inhibitor Ac-Asp-D-Glu-Leu-He-Cha-Cys-OH [IC 50 = 15 nM, 

Ingallinella et al., Biochemistry 37, 8906-8914 (1998)] were grown by standard 
hanging-drop vapor-diffusion methods at room temperature. Protein solution: 21.6 
mg/ml protein in 0.5 M NaCl, 25 mM MES, pH 6.5, 10% (v/v) glycerol, 2 mM DTT, 
and 5.64 mM inhibitor (6X molar excess) incubated at room temperature for 2 hours. 
15 Reservoir solution: 2 M ammonium sulfate, 0. 1 M sodium acetate, pH 4.6, 1% (v/v) 
PEG monomethyl ester 350, 5 mM zinc chloride. Droplets were composed of equal- 
volume aliquots of protein and reservoir solutions. Crystals were obtained overnight 
by these conditions. 

A crystal was taken from its droplet and placed in a small volume of reservoir 
20 solution that had been made 20% (v/v) in glycerol. It was extracted with a standard 
Hampton fiber loop mounted in a Hampton pin and immediately introduced into the 
100K nitrogen stream from an Oxford Cryosy stems low-temperature device. 

Data to 2 A resolution were collected from this crystal on a rotating-anode source 
(CuK() with an R-AXIS II detector. Completeness is 94% from 20-2 A resolution, 
25 and 64% in the outer shell (2.07-2.00 A), with R(symm)s of 9.1% and 39.7% for all 
and for outer shell, respectively. 

X-ray diffraction from this crystal indicates space group P4 x 2i2 with unit cell 
parameters a = b = 67.1,c = 81.2A and one molecule per asymmetric unit. 

The structure was solved by application of standard molecular-replacement 
30 techniques with a NS4a-NS3 fusion search model based upon a previously reported 
structure of HCV protease:NS4a complex whose coordinates are on deposit with the 
Protein Data Bank [UXP.pdb as described in Protein Sci. 7, 837-847 (1998)]. 
Refinement of the structure by XPLOR with data from 20.0-2.1 A resolution and 
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F/((F)>1.0 was suspended at an R-factor of 19.7% and R-free of 27.6%. The current 
model (1) includes about 120 water molecules, of which about 2/3 were added by an 
automated routine and have not been checked in the electron density map, (2) lacks 
modeled residues for loops at 0-4 and 87-89 for which density is not clear, and (3) 

5 includes no alternate conformations for side chains although several were found. 

Electron density for the entire inhibitor is clearly seen, and its conformation and 
that of its binding site on the protein are unambiguously defined, thus making the 
structure immediately useful for drug design purposes. The quality of the inhibitor 
map can be seen in Figure 8. The inhibitor binds entirely on the P side of the active 

10 site and the carboxylate at its C-terminus binds in a pocket that corresponds to the 

oxyanion hole of the classical serine protease active site. Away from the active site, a 
zinc ion is coordinated by the side chains of cysteines at 97, 99, and 145, and by a 
water molecule, although the identity of this latter ligand is questioned. As the B 
factor of this water refines to 2 A 2 and there is a large residual peak in the difference 

15 map at its position, it is suspected of being another component of the crystallization 
fluid and we have refined it as a chloride ion. 

The above examples are illustrative and do not limit the claims. 
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